Abstract
A very quick overview of visualizing data with ggplot2.This code can be found at https://github.com/libjohn/workshop_rfun_flipped/ggplot2_quick.Rmd
I only need ggplot2 but I like to load tidyverse because it includes 8 complimentary packages, including ggplot2.
# library(ggplot2)
library(tidyverse)
Get more information from:
The ggplot2 template is used to identify the dataframe, identify the x and y axis, and define visualized layers
ggplot(data = ---, mapping = aes(x = ---, y = ---)) + geom_----()
Note: ---- is meant to imply text (function names, dataframe names, variable names) you supply.
It is helpful to see the argument mapping, above. In practice, rather than typing the formal arguments, code is typically shorthanded to this:
dataframe %>% ggplot(aes(xvar, yvar)) + geom_----()
Visualize a scatter plot showing the relationship of mass to height for Star Wars characters in the dplyr::starwars dataframe, excluding the heaviest character. Indicate a linear regression line.
dplyr has an onboard dataset, starwars
data(starwars)
starwars
This feels like, and looks like, you drew an empty box.
starwars %>%
ggplot()
But wait, there’s more….
Still doesn’t look like much. You will initialize the plot scales and labels based on the values of the variables in the dataframe.
starwars %>%
filter(mass < 500) %>%
ggplot(aes(height, mass))
In the above, I subset the data, removing any Star Wars characters weighing more than 500 Kg – dplyr::filter(). Then I initialized the base layer with the height as the x axis and mass as the y axis. ggplot drew the scales for me.
Since I have two numeric variables, height and mass, I’ll start with a scatter plot. Scatter plots are generated by the geom_point() function.
starwars %>%
filter(mass < 500) %>%
ggplot(aes(height, mass)) +
geom_point()
aes() arguments mapped locally in geom_point()starwars %>%
filter(mass < 500) %>%
ggplot() +
geom_point(aes(height, mass))
Many arguments can be mapped inside the aesthetic, aes(), to leverage variable values, OR set a visualized property outside the aes() function, but inside the geom_ function.
Aesthetic arguments include:
color is mapped inside aes() functionstarwars %>%
filter(mass < 500) %>%
ggplot() +
# geom_point(mapping = aes(x = height, y = mass, color = gender))
geom_point(aes(height, mass, color = gender))
Notice the legend was drawn automatically, above, by mapping an aesthetic
color set outside the aes() functionstarwars %>%
filter(mass < 500) %>%
ggplot() +
geom_point(aes(height, mass), color = "goldenrod")
| Type | Geom |
|---|---|
| Bar graph: | geom_bar() geom_col() |
| Histogram: | geom_hist() |
| Scatter plot: | geom_point() geom_jitter() |
| Line graph: | geom_line() |
| Box plot: | geom_boxplot() |
| Density: | geom_density() geom_violin() |
| Heat map: | geom_heatmap() |
| Mapping: | geom_sf() |
| Regression line: | geom_smooth() |
A list of available geom_ functions, or layers, can be found in the help or on the website: https://ggplot2.tidyverse.org/reference/index.html#section-geoms
starwars %>%
mutate(species = fct_lump_min(species, 2)) %>%
ggplot(aes(species, height)) +
geom_boxplot()
babynames::babynames %>%
filter(name == "Watts") %>%
ggplot(aes(year, n)) +
# geom_point() +
geom_line()
There are two simple approaches to visualizing overplotted data.
alpha argument to affect the opacity of the points. In this way, overplotted data will appear as darker points on the plotstarwars %>%
filter(mass < 500) %>%
ggplot() +
geom_point(aes(height, mass), alpha = .3)
geom_jitter()geom_jitter will not change the values of the data but it will offset data points, making it easier to perceive the overplotting.
starwars %>%
filter(mass < 500) %>%
ggplot() +
geom_jitter(aes(height, mass))
Each layer can support local arguments and draw from the global settings. Below we use the geom_line() function, followed by the geom_point() function.
babynames %>%
ggplot(aes(year, prop)) +
geom_line(aes(color = sex)) +
geom_point(alpha = 0.4, shape = "cross")
But there is more to that graph, here’s the full code for the above graph.
library(babynames)
library(ggplot)
babynames %>%
filter(name == "John" & sex == "M" |
name == "Elizabeth" & sex == "F") %>%
ggplot(aes(year, prop)) +
geom_line(aes(color = sex)) +
geom_point(alpha = 0.4, shape = "cross") +
geom_text(data = . %>% filter(year == 1965), aes(label = name),
nudge_y = .009) +
labs(title = "Name Popularity") +
theme(legend.position = "none")
Recall the goal mentioned in the beginning. We want a scatter plot and a regression line. This can be accomplished by adding a layer in the form of another geom_ function: geom_smooth()
starwars %>%
filter(mass < 500) %>%
ggplot(aes(height, mass)) +
geom_point() +
geom_smooth(method = lm, se = FALSE)
Categorical values are most easily ordered with the forcats library. Part of the tidyverse, forcats will convert string data into factors, i.e. categorical data. This enables ordering.
msleep %>%
ggplot(aes(vore)) +
geom_bar()
Change the order of the bars by the frequency of observations.
msleep %>%
ggplot(aes(fct_infreq(vore))) +
geom_bar()
Notice below, we use the fill = argument to set the color of the bar. In the scatter plot, above, we used the color = argument. For many geoms you can use both color and fill. How do these arguments differ? Where can you look to find out more about fill and color?
starwars %>%
ggplot(aes(fct_rev(fct_infreq(eye_color)))) +
geom_bar(fill = "grey70") +
geom_bar(data = starwars %>% filter(eye_color == "orange"), fill = "darkorange") +
coord_flip()
Faceting is great way to make subplots of the same dataframe. See both facet_wrap() and facet_grid()
mpg %>%
ggplot(aes(displ, hwy)) +
geom_point() +
facet_wrap(~ class)
I’ll briefly introduce the use of scales to affect. In this case, scales are used to affect the color of the plot. Read more about scales.
Viridis scales apply color palettes to continuous, discrete, or binned data
msleep %>%
ggplot(aes(fct_infreq(vore), sleep_total)) +
geom_col(aes(fill = conservation)) +
scale_fill_viridis_d(na.value = "grey80")
The color brewer palette is similar but has a wider array of palettes to choose from.
msleep %>%
ggplot(aes(fct_infreq(vore), sleep_total)) +
geom_col(aes(fill = conservation)) +
scale_fill_brewer(type = "qual", na.value = "grey80")
To find available colors: Google search “R color names”, or specific to ColorBrewer….
#display.brewer.pal(7,"Dark2")
RColorBrewer::display.brewer.all()
Sometimes a manual scale is preferred. I like to google-search: “R color names” for helpful documentation.
mycolors <- c("firebrick", "forestgreen", "navy", "darkorange",
"goldenrod", "sienna")
msleep %>%
ggplot(aes(fct_infreq(vore), sleep_total)) +
geom_col(aes(fill = conservation)) +
scale_fill_manual(values = mycolors, na.value = "grey80")
Scales are used to manipulate the visual properties of the data. Beyond using scales to modify colors, another example is logarithmic scales to account for data skew. In this way you can clarify the data pattern. For example, using the ChickWeight dataset, we visualize the weights of the chicks over time.
data("ChickWeight")
ChickWeight %>%
ggplot(aes(Time, weight, color = Diet)) +
geom_line(aes(group = Chick))
Using scale_y_log10 we can alter the scale to highlight a more understandable data pattern
chicken_plot <- ChickWeight %>%
ggplot(aes(Time, weight, color = Diet)) +
geom_line(aes(group = Chick)) +
scale_y_log10()
chicken_plot
The labs() function is specialized scales function, used to apply labels. For example, use the labs() function to add a title, subtitle, legend title, modify axis labels, and set a caption. See more on scales.
plot_sleep <- msleep %>%
mutate(vore = case_when(
vore == "herbi" ~ "Herbivore",
vore == "omni" ~ "Omnivore",
vore == "carni" ~ "Carnivore",
vore == "insecti" ~ "Insectivore"
)) %>%
ggplot(aes(fct_infreq(vore), sleep_total)) +
geom_col(aes(fill = conservation)) +
scale_fill_brewer(type = "qual", na.value = "grey80") +
labs(title = "Animal sleep times",
subtitle = "A practice dataset",
fill = "Conservation\nType",
x = "",
y = "Sleep time in hours",
caption = "Source: ggplot::msleep")
plot_sleep
Themes are used to manipulate the stylistic characteristics of the non-data components of your plot, such as font faces, text sizes, and grid lines. ProTip: quickly manipulate a single plot with preset themes such as theme_dark, or use a specialized theme extension such as theme_ipsum from the hrbrthemes package.
https://ggplot2.tidyverse.org/reference/ggtheme.html
theme_dark(), theme_light(), theme_classic()https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/
See more on themes

plot_sleep +
theme_dark()
plot_sleep +
theme_classic()
https://cinc.rud.is/web/packages/hrbrthemes/
plot_sleep +
hrbrthemes::theme_ipsum(grid = "Y") +
hrbrthemes::scale_fill_ipsum(na.value = "grey80",
labels = c("Critical", "Domesticated",
"Endangered", "Least Concern",
"Threatened", "Vulnerable")) +
theme(plot.title.position = "plot")
The patchwork package makes it “ridiculously simple to combine separate ggplot objects into the same graphic.” See more about patchwork
# install.packages("devtools")
# devtools::install_github("thomasp85/patchwork")
# https://patchwork.data-imaginist.com/
library(patchwork)
(plot_sleep / chicken_plot)
Use the ggplotly function to transform your static plot into an interactive plot that can be used in dashboards and web presentations.
See more at the Plotly ggplot2 Library page, and the Interactive web-based data visualization with R, plotly, and shiny book.
library(plotly)
ggplotly(plot_sleep)
Use the gganimate package to bring your plot to life through the wonders of animation. Learn more at the resource page for gganimate
For Example:

Designing effective visualizations by Dr. Mine Çetinkaya-Rundel - Introduction to Data Science https://introds.org
Data Visualization: A Practical Introduction. Kieran Healy
ggplot2: Elegant Graphics for Data Analysis. Hadley Wickham
Data Visualization with R. Rob Kabacoff
Interactive web-based data visualization with R, plotly, and shiny. Carson Sievert